Simultaneous enhancement of multiple functional properties using evolution-informed protein design

A major challenge in protein design is to augment existing functional proteins with multiple property enhancements. Altering several properties likely necessitates numerous primary sequence changes, and novel methods are needed to accurately predict combinations of mutations that maintain or enhance function. Models of sequence co-variation (e.g., EVcouplings), which leverage extensive information about various protein properties and activities from homologous protein sequences, have proven effective for many applications including structure determination and mutation effect prediction. We apply EVcouplings to computationally design variants of the model protein TEM-1 β-lactamase. Nearly all the 14 experimentally characterized designs were functional, including one with 84 mutations from the nearest natural homolog. The designs also had large increases in thermostability, increased activity on multiple substrates, and nearly identical structure to the wild type enzyme. This study highlights the efficacy of evolutionary models in guiding large sequence alterations to generate functional diversity for protein design applications.

Deep Mutational Scan replicate 1 at 2,500 µg/mL ampicillin spearman=0.717 (n=4,788)   Correlation between predicted fitness (EVH) and experimentally quantified fitness of point mutations in WT TEM-1 Supplementary Figure 1.Relationship between predicted fitness score and experimentally-determined fitness effect of individual mutations from a published deep mutational scan 1 .Each blue dot represents a single point mutation in WT TEM-1 of which there are 4,788 possible mutations over 252 positions aligned in the multiple sequence alignment used for model generation and fitness prediction.The predicted fitness effect (∆EVH) on the x-axis is the predicted fitness of the point mutant minus the predicted fitness of WT TEM-1.The experimentally measured fitness score on the y-axis is defined in Stiffler et al. 1 .Left: Experimental replicate 1, which quantified all possible mutations (n=4,788 with spearman=0.717).Right: Experimental replicate 2, which quantified all mutations in 251 positions (i.e, one fewer than replicate 1; n=4,769 with spearman=0.702).Source data are provided in the Source Data file.Predicted fitness of designs compared to random WT TEM-1 variants Supplementary Figure 2. Comparison of each designs' predicted fitness with randomly generated sequences.For each design, 1 million random protein sequences were generated on the WT TEM-1 sequence background with an equal number of mutations as the design.The predicted fitness of each design is indicated as an "X" and the distribution of the random sequences is shown as a violin plot.In general, the designs had a much higher predicted fitness than random variants.Example source data are provided in the Source Data file.Random data can be regenerated using scripts at https://github.com/gauthierscience/beta-lac-protein-design 2 .
Designs with their most similar natural homologs Supplementary Figure 3. Multiple sequence alignments (MSAs) of each tested design with WT TEM-1 and their most similar homologs in the MSA used for model inference.For each design, homologs from the MSA were ranked by sequence identity (number of identical amino acids at each position), and the top five were selected.Each row contains a single design with the design sequence followed by WT TEM-1 and the ordered list of most similar homologs (most similar at the top).Mutations counts of the design relative to WT TEM-1 (wt) and the most similar homolog (h) are listed under the design name.Amino acids changes relative to the design are colored by new residue property (standard colors: green, hydrophobic and glycine (G); blue, negative charge; red, positive charge; light blue, polar).The logo for each position follows the same color scheme, with the design amino acid shown in gray.Dashs (-) represent gaps.Alignments generated by https://fast.alignmentviewer.org.Source data are provided in the Source Data file.Property differences of positions mutated in at least one design compared positions not mutated in any design Supplementary Figure 4. Properties of positions mutated in any of the generated designs compared to non-mutated positions.Each panel contains two cumulative distribution functions (CDFs): positions mutated in one or more designs (red) and positions not mutated in any design (blue).Mutated and non-mutated positions were aggregated from all 38 of the generated designs: opt.a, opt.b, and the six designs generated for each of the six distance constraints (i.e., not only the designs that were experimentally characterized).The y-axis is the proportion of positions that are equal or less than the value of the property shown on the x-axis.
[A] CDFs of positional conservation in the multiple sequence alignment used for model inference (maximum shannon entropy of all positions minus the shannon entropy at each position).Positions mutated in the designs were less likely to be conserved than positions not mutated.
[B] CDFs of relative surface accessibility from DSSP analysis (ACC field) of a published WT TEM-1 structure (PDB: 1XPB).Residues mutated in the designs were more likely to be surface accessible than non-mutated residues.
[C] CDFs of residue-residue interaction counts from a published WT TEM-1 structure (PDB: 1XPB).An interaction is defined as having any atom in one residue be within 5 angstroms of any atom in the other residue.Residues mutated in the designs generally had fewer interactions than non-mutated residues.Source data are provided in the Source Data file.Broth Microdilution MIC Assay -Resistance to Meropenem in E. coli min conc tested ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12 ≤0.12   [B] MIC of ceftazidime.Gray bars: mean of three replicates (red diamonds).
[C] MIC of cephalothin.For cephalothin, several of the replicates exceeded the maximum concentration tested (256 µg/mL), so the gray bars are either the mode of the replicates or, if there was no mode, the median of the three replicates (red diamonds).The relative MIC between sequences was largely in agreement with the broth microdilution resistance assays in the main text (Figure 4B).Source data are provided in the Source Data file.H

h
pe tl vA VR DVEQQY SA RV GV HV RDLRTG AT Le SY RP DDRFPL AS TF KV LA CGALLR DA DA GD AD LDAVVR WE AS EV VE NSPITE ER VG TG LS LRQAAE AA LT RS DN TAGNVV LR QI GG PA GLTRFL RD LG DR TT RLDRWE TD LN EA TP DD RRDT TT PRAL TA TL EALL TG RALT PP DR ARLT DW MAAN AV SG PLFR SG LPDG WT IA DRSG AG GrGT RG IAAVVW TDD GRP LLI SVLTTR TD PALP SD NALV AR TA TALV kh w K G KPS RIV VIY G SQ ATMD EL RQI E IG AS I K G KPS RIV VIY G SQ ATMD ER RQI E IG AS I K G KPS RIV VNY G SQ ATMD ER RQI E IG AS I K G KPS RIV VIY G SQ ATMD ER RQI E IG AS I vR LR DIEARV GA RL GV AV LDTASG RT Le GY RA DERFPL MS TF KV LL CGAVLS RV DA GQ EQ LARRIR YT AA DL VE YSPVTE KH LA DG MT VGELCE AA IT LS DN TAANLL LA TV GG PA GLTAYL RG LG DA TT RLDRIE PA LNEA VP GD ERDT TT PAAM AA TL RKLL LG EALS PA SR QQLT AW MLAN RV GG TRLR AG LPAG WR IA DKTG TG ErGS RN DIGVLW PPG RAP IVV AAYLTG TP ATLA ER NAAL AE VG RAVA kh w K vT VR RIEAQL GA RV GV AV LDTGSG RS We GY RA DERFPM AS TF KV LA CGALLS RV DA GQ ED LDRRIR YT QD EL VT YSPVTE KH LD DG MT LRALCE AT IT TS DN TAANLI LE AL GG PK ALTRFL RA IG DP VT RLDRWE TA LN EA TP GDVRDT TT PR AM AA TLRTLL LG DA LT PA SRQQLI AW LE AN QV GGPLLR AG LP AG WR IGDKTG AG Gr GT RG IVAIVW PPGR AP LIA AVYLTE SEASMD ER NA AI AE IGAALV kh w VK DI ESRL GA RVGY AV LD TASG KI WeSY RA DE RFPM MS TFKV L CG ALLS RV DAGQ EQ LD RRIR FR QSDL VT YS PVTE KH LDDG MT LA ELCE AA ITMS DN TA GNLV LE TLGG PE GL TAFL RT LGDQ VT RL DRWE TA LNEA TP GD ERDT TT PAAM AA TL RKLL TG DALS PA SR QQLI DW MEAD KV AG PLLR SA LPAG WF IA DKSG AG GrGS RG IVAALG PPG KPP RIV VIYLTE TE AT MD ER NAAIAE IG AA LI kh w vA VK EAEDQL NA RV GY AE LDLASG KI Le SY RA DERFPM MS TF KV LL CGAVLS RV DA GQ EQ LDRRIK YR QN DL VE YSPVTE KH LT DG MT VGELCS AA IT MS DN TAANLL LS TI GG PK GLTAFL RK TG DQ VT RLDRWE PE LN EA LP GDERDT TT PA AM AK TLRKLL TG ET LS AA SRQQLI DW ME AD KV AGPLLR SV LP AG WF IADKTG AG Er GS RG IIAALG PDGK PS RIV VIYLTE TQATMD ER NK QI AE IGASLI kh w vE FA RLEREF DA RL GV YA LDTGTG RT Ve AH RA DERFAY AS TF KA LA AGAVLQ RN SA GL EE LDEVVT YT RD DL VT HSPITE KH VD TG MT LRELCD AA VR YS DN TAGNLL LR EL GG PA GFEAAL RE IG DD VT RADRYE TE LNEA VP GD PRDT ST PRAL AT SL RAFV LG DALP AD KR ALLT DW LRRN TT GD TLIR AG VPEG WE VG DKTG AG GrGT RN DIAVLW PPD GAP IVL AVMSSR DT EDAE YD DALI AE AA AVAV kh w Properties of mutated positions: comparison of mutation count, positional conservation, and surface accessibility for each set of distance-constrained designs single position in core of WT TEM-1 single position on surface of WT TEM-, 98.b + four untested "98" designs 95.a, 95.b + four untested "95" designs , 90.b + four untested "90" designs 80.a, 80.b + four untested "80" count in the 6 designs 50.a, 50.b + four untested "50" designs Supplementary Figure 5. Relationship of mutation count at each position to conservation and surface accessibility.Each dot represents a single position that was aligned in the natural multiple sequence alignment and available to be mutated in the design process (n=252).Each plot contains data from all six design sequences at the specified distance threshold (i.e., 98, 95, 90, 80, 70, 50).[X-axis]:the mutation count for each position tallied across the six designs.[Y-axis]: conservation as defined as the maximum shannon entropy of all positions minus the shannon entropy at each position.[Colors]:each dot (position) is colored by the relative surface accessibility of the position in the WT TEM-1 structure (PDB: 1XPB).Red indicates fully buried in the core of the protein and dark blue represents maximum surface accessibility.Intermediate accessibility is colored as shown in the colorbar, with the transition between red and blue set to the median surface accessibility value so that half of the positions are core and half exposed (i.e., half red and half blue).Source data are provided in the Source Data file.

Supplementary Figure 7 .replicate 1 replicate 2 replicate 3 Supplementary Figure 8 . 3 SupplementarySupplementary
Ability of designs to confer resistance to ampicillin in E. coli.In addition to the broth microdilution assay in the main text (Figure3), two independent assays were used to measure minimum inhibitory concentration (MIC) of the canonical β -lactam substrate ampicillin.Resistance to ampicillin as assessed by [A] ability to form colonies across a serial dilution of ampicillin in MH agar, and [B] a MIC strip assay (Liofilchem).The maximum concentration of ampicillin tested, and therefore maximum MIC resolution, in the MIC strip assay in [B] was 256 µg/mL (indicated in red).Gray bars: mean of three replicates (red diamonds).The relative MIC between sequences was largely in agreement with the broth microdilution assay in the main text (Figure3A).Source data are provided in the Source Data file.concentration over time, and slope (initial rate) used to calculate Michaelis-Menten kinetics measured data: slope: Nitrocefin hydrolysis product concentration over time as a function of initial substrate concentration.Rows are each tested enzyme.Columns contain the initial substrate (nitrocefin) concentration.Dots are the absolute measured product concentration.Lines overlaid on the dots show the initial reaction rate used for fitting the Michaelis-Menton equation.Each replicate is shown in a different color (red: replicate 1, green: replicate 2, blue: replicate 3).No hydrolysis was detectable for neg.ctrl, consensus, and 70.b.Designs 50.a and 50.b were unable to be purified.Source data are provided in the Source Data file.Nitrocefin: comparison of Michaelis-Menten fit to measured data initial reaction rate (v) (max slope of 4 timepoints) substrate concentration (μM nitrocefin) Supplementary Figure 9.Comparison of Michaelis-Menten fit to measured data.Each plot contains data from a single tested sequence.Y-axis: initial reaction rate.X-axis: substrate (nitrocefin) concentration.Dots: measured data used to fit the Michaelis-Menten equation (red: replicate 1, green: replicate 2, blue: replicate 3).Purple line: fitted results (predicted initial reaction rate as a function of nitrocefin concentration) plotted to the maximum substrate concentration used for fitting (400 µM or 800 µM, Methods).No hydrolysis was detectable for neg.ctrl, consensus, and 70.b.Designs 50.a and 50.b were unable to be purified.Source data are provided in the Source Data file.Figure 10.Ampicillin hydrolysis substrate concentration over time.Each plot contain data from a single tested sequence.X-axis: time in seconds.Y-axis: absorbance measured at 235 nm.Dots are measured data, and the overlaid lines show the initial rate (linear regression of five timeopints) for each of the three replicates.Each replicate is shown in a different color (red: replicate 1, green: replicate 2, blue: replicate 3).No hydrolysis was detectable for neg.ctrl, consensus, and 70.b.Designs 50.a and 50.b were unable to be purified.Source data are provided in the Source Data file.Figure 11.Melting curves showing fluorescence emission changes with temperature in a Differential Scanning Fluorimetry assay to quantify thermal stability.Purified protein was incubated with SYPRO Orange dye to monitor protein unfolding (Methods).Melting temperature (T m ) is quantified as the midpoint of the transition curve (vertical lines).Colors depict replicate 1 (red), replicate 2 (green), and replicate 3 (blue).Source data are provided in the Source Data file.Resistance to Cefoxitin, Imipenem, Meropenem in E. coli assessed by Broth Microdilution MIC assay Assay -Resistance to Cefoxitin in E. coli Assay -Resistance to Imipenem in E. coli . For some β -lactam antibiotics, the designs showed no consistent differences in their ability to confer resistance with WT TEM-1 or the negative controls.Minimum inhibitory concentration (MIC) in E. coli was determined by a Clinical and Laboratory Standards Institute (CLSI) broth microdilution assay.The aggregated MIC calls (gray bars, see Methods) summarize three individual replicate experiments.[A] Resistance to cefoxitin, a second-generation cephamycin β -lactam antibiotic.[B] Resistance to imipenem and [C] meropenem, both of which are members of the carbapenem class of β -lactam antibiotics.The aggregated MIC calls (gray bars, see Methods) summarize three individual replicate experiments (red diamonds).Source data are provided in the Source Data file.
Supplementary Figure13.Ability of designs to confer resistance to β -lactam antibiotics aztreonam, ceftazidime, and cephalothin in E. coli.Minimum inhibitory concentration (MIC) in E. coli was determined by a MIC strip assay (Liofilchem).[A] MIC of aztreonam.Gray bars: mean of three replicates (red diamonds).
Supplementary Figure 15.Difference Distance Matrix comparing 80.a (chain A) to WT TEM-1 (PDB: 1XPB chain A).The distance between each pair of Cα atoms in both the 80.a and WT TEM-1 structures was calculated, and then the difference was computed by subtracting the WT TEM-1 distance from the 80.a distance.The only region substantially different is in a loop between positions 255-257.The Difference Distance Matrix for chain B of 80.a (compared to WT TEM-1) is nearly identical (not shown).Colors: Red indicates the distance between residue pairs is larger in 80.a than in WT TEM-1.Blue indicates the distance between residue pairs is larger in WT TEM-1 than in 80.a.Intermediate colors quantified as shown in in the colorbar.Source data are provided in the Source Data file.
R E L C S A A I T M S D N T A A N L L L T T I G G P K E L T A F L H N M G D H V T R L D R W E P E L N E A I P N D E R D T T M P AA M A T T L R K L L T G E L L T L A S R Q Q L I D W M E A D K V A G P L L R S A L P A G W F I A D K S G A G E R G S R G I I A A L G P D G K P S R I V V I Y T T G S Q A T M D E R N R Q I A E I G A S LI K Supplementary Figure 17.Difference Distance Matrix comparing 70.a (chain A) to WT TEM-1 (PDB: 1XPB chain A).The distance between each pair of Cα atoms in both the 70.a and WT TEM-1 structures was calculated, and then the difference was computed by subtracting the WT TEM-1 distance from the 70.a distance.The only regions substantially different occur in two loops (positions 53-55 and 255-257).The Difference Distance Matrix for chain B of 70.a (compared to WT TEM-1) is nearly identical (not shown).Colors: Red indicates the distance between residue pairs is larger in 70.a than in WT TEM-1.Blue indicates the distance between residue pairs is larger in WT TEM-1 than in 70.a.Intermediate colors quantified as shown in in the colorbar.Source data are provided in the Source Data file.

K
VK EA ESQL GG RVGY AE LD LASG KI LeGY RP DE RFPM MS TFKV LL CG AVLS RV DAGQ EQ LD RRIH YR QQDL VE YS PVTE KH LADG MT VG ELCS AA ITMS DN TA ANLL LT TIGG PK GL TAFL RS IGDH VT RL DRWE PE LNEAIP GD ER DT TT PAAMAA TL RK LL TG EVLTPA SR QQ LI DW MVADKV AG PL LR SV LPAGWF IA DK SG AG ErGSRG IVAA LG PDG KPARIV VIYLTG TP AS MD ER NRQIAE IG AS LI kh w D D vQ VK DA ESQL NA RVGY AE LDLASG KI LeSY RA DE RFPM MS TFKV LL CG AVLS RV DAGQ EQ LD RRIH YR QSDL VE YS PVTE KH LTDG MT VG ELCS AA ITMS DN TA ANLL LT TIGG PQ EL TAFL RN TGDQ VT RL DRWE PE LNEA LP GD ERDT TT PAAM AQ TL RKLL TG ELLS PA SQ QQLI RW MEAD KV AG PLLR SV LPAG WF IA DKTG AG ErGS RG IVAALG PDG KPS RIV VIYITE TQ AT MD ER NRQIAE IG AS LI kh w vA VK DAEDQL GA RV GY AE LDLASG KI Le SY RP DERFPM MS TF KV LL CG AVLS RV DAGQ EQ LD RRIH YR QNDL VE YS PVTE KH LTDG MT VG ELCS AA ITMS DN TA ANLL LS TIGG PK EL TAFL HK MGDH VT RL DRWE PE LNEA IP GD ERDT TT PAAM AK TL RKLL TG ETLT AA SR QQLI DW MEAD KV AG PLLR SA LPAG WF IA DKTG AG ErGS RG IIAALG PDG KPS RIV VIYLTG SQ ATMD ER NRQI AE IG ASLI kh w vA VK DAEDQL GA RV GY AE LDLASG KI Le SY RP DERFPM MS TF KV LL CGAVLS RV DA GQ EQ LGRRIH YS QN DL VE YSPVTE KH LT DG MT VGELCS AA IT MS DN TAANLL LT TI GG PK ELTAFL RN MG DH VT RLDRWE PE LN EA IP GD ERDT TT PAAM AT TL RKLL TG ELLT LA SR QQLI DW MEAD KV AG PLLR SA LPAG WF IA DKSG AG Er GS RG IIAALG PDGK PS RIV VIYLTG SQATMD ER NR QI AE IGASLI kh w vA VK DAEDQL GA RV GY AE LDLASG KI Le SY RP EERFPM MS TF KV LL CGAVLS RV DA GQ EQ LDRRIH YS QN DL VE YSPVTE KH LT DG MT VGELCS AA IT MS DN TAANLL LT TI GG PK ELTAFL HN MG DH VT RLDRWE PE LNEA IP ND ERDT TT PAAM AR TL RKLL TG ELLT PA SR QQLI DW MEAD KV AG PLLR SA LPAG WF IA DKTG AG Er GS RG IIAALG PDGK PS RIV VIYTTG SQATMD ER NR QI AE IGASLI kh w The distance between each pair of Cα atoms in both the 80.b and WT TEM-1 structures was calculated, and then the difference was computed by subtracting the WT TEM-1 distance from the 80.b distance.The only region substantially different is in a loop between positions 255-257.The Difference Distance Matrix for chain B of 80.b (compared to WT TEM-1) is nearly identical (not shown).Colors: Red indicates the distance between residue pairs is larger in 80.b than in WT TEM-1.Blue indicates the distance between residue pairs is larger in WT TEM-1 than in 80.b.Intermediate colors quantified as shown in in the colorbar.Source data are provided in the Source Data file.

Table 2 .
Supplementary Table1.Michaelis-Menton parameters of Nitrocefin hydrolysis.All values rounded to nearest integer.Number following ± indicates standard error, which was derived directly from the model fit (Python lm f it module) for k cat and K m , or calculated using error propagation for k cat /K m (Methods).β -Lactamase crystallographic data and refinement statistics.Values in parentheses correspond to the statistics in the highest resolution bin.RMSD, root-mean-square deviation.